-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use file-level expressions and avoid slow loop for whitespace_linter() for up to 50% speedup #2024
Conversation
Hmm, I think the error fixed by b5793a7 also reduced the efficiency improvement. I only saw a 3% improvement just now. |
Can you update the timings after the full change. I doubt they will change much, but still good to be sure |
I'm looking more closely at what the bottleneck is, and c.f. source_expressions <- get_source_expressions("QC.R")$expressions
full_expression <- source_expressions[[length(source_expressions)]]
regex <- "^(?:\\s)*(?:\t)+"
system.time(re_matches(full_expression[["file_lines"]], regex, locations = TRUE, global = TRUE))
# user system elapsed
# 1.956 0.016 1.982
system.time(grepl(regex, full_expression[["file_lines"]]))
# user system elapsed
# 0.01 0.00 0.01 |
I think the issue is system.time(re_matches(full_expression[["file_lines"]], regex, locations = TRUE, global = FALSE))
# user system elapsed
# 0.002 0.000 0.002
|
Bad news: my simple 5-character PR has ballooned substantially Good news: this PR now marks a 400x (!!) speed improvement for QC.R (timings added to the top) |
Codecov Report
@@ Coverage Diff @@
## main #2024 +/- ##
==========================================
- Coverage 98.56% 98.55% -0.01%
==========================================
Files 113 113
Lines 5006 4995 -11
==========================================
- Hits 4934 4923 -11
Misses 72 72
|
I was testing out why our lint suite ran so slow on a particular long file & found somewhat surprisingly that
whitespace_linter()
was the slowest of ~100 linters.We can test this on R's
src/library/tools/R/QC.R
:End-to-end timing including
get_source_expressions()
:So an end-to-end (user-facing) 50% improvement, inclusive of parsing time.
So while this approach is not as cache-friendly, of course checking initial whitespace in a file should not be expensive, so I don't think we lose much by switching to a file-level search.